Lesson: Configuring the Match Transform 


Find duplicates based on... 


...numerical values Or ...date values 


Example: #/-Sm n Ad Name Address DOB Comment 
~ 1429 W Elizabeth 
Margaret St., Fort Collins, CO 


Roberts 80522 1/23/1845 Duplicate’ 


942 California Ave, 
Salt Lake City, UT, Not 
Neil Nevue 84115 1/23/1945 Duplicate 
1429 W Elizabeth 
St., Fort Collins, CO 


M Roberts 80522 2/23/1945 Duplicate 


1429 W Elizabeth 
Margaret St., Fort Collins, CO Not 
Roberts 80522 12/23/1945 Duplicate 


Example: +/- 35 days 


ers) 


Desc Type Diameter Length Comment 


SS Rod 28188 SS 1.81 28 Duplicate 


SS Rod 20181 SS 1.81 20 Not Duplicate a a ase ie Ro ai ja i i | 


SS Ron 30191 SS 1.81 30 Duplicate | as ps 
A Figure 32: Numerical and Date Proximity Matching 


Comparing Compound Family Name 
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This is an extension of the substring adjustment score option in the sense that it loosens 
some of the prerequisites. Not only can it find new matches, it can also boost the score for 
some existing matches. The Approx substring option works only on one field. If the family 
names are in separate fields, they need to be concatenated into one field. It may be useful to 
know how Adjustment scores are calculated may be useful, but not necessary for all to 
understand. Say there are two strings: Intl. and International. Without any adjustment 
scores, (Number of identical characters * 100)/Total number of characters = (8*100)/17 = 
47. Now if the adjustment score is 80, it is applied to the characters that are not identical. 
Score = (Number of identical characters * 100) + (Number of nonidentical characters * 80)/ 
Total number of characters = (8*100 + 9*80)/17 = 89. 





Comparing Compound Family Names — Approximate Substring Matching 
Approximate Substring Matching 


e Mainly to support compound family names from Brazil, Mexico, and so on, but can be 
applied to non-name data. Data must be in One input field. 


e Example CRUZ RODRIQUEZ and GARCIA CRUZ DE RDZ can match. 
e Loosens some requirements of the Substring matching option: 
First words do not have to match exactly, such as CRUZ and GARCIA. 


The words that do match can utilize initials and abbreviations adjustments, such as 
RODRIGUEZ and RDZ. 


Matching words have to be in the same order, but there can be other non-matching 
words around matching words, such as GARCIA, DE. 
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